Multi-integrated genomic data for Passiflora foetida provides insights into genome size evolution and floral development in Passiflora

Passiflora is a plant genus known for its extremely distinctive and colorful flowers and a wide range of genome size variation. However, how genome characteristics are related to flower traits among Passiflora species remains poorly understood. Here, we assembled a chromosome-scale genome of P. foetida, which belongs to the same subgenus as the commercial passionfruit P. edulis. The genome of P. foetida is smaller (424.16 Mb) and contains fewer copies of long terminal repeat retrotransposons (LTR-RTs). The disparity in LTR-RTs is one of the main contributors to the differences in genome sizes between these two species and possibly in floral traits. Additionally, we observed variation in insertion times and copy numbers of LTR-RTs across different transposable element (TE) lineages. Then, by integrating transcriptomic data from 33 samples (eight floral organs and flower buds at three developmental stages) with phylogenomic and metabolomic data, we conducted an in-depth analysis of the expression, phylogeny, and copy number of MIKC-type MADS-box genes and identified essential biosynthetic genes responsible for flower color and scent from glandular bracts and other floral organs. Our study pinpoints LRT-RTs as an important player in genome size variation in Passiflora species and provides insights into future genetic improvement. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s43897-023-00076-x.


Introduction
The genus Passiflora contains more than five hundred species worldwide, with most of the species distributed in the Americas (Cerqueira-Silva et al. 2014).They are generally classified into five main subgenera: Astrophaea, Decaloba, Tetrapathea, Deidamiodes, and Passiflora (Muschner et al. 2003;Sader et al. 2019).Many species in Passiflora are highly valued for their edible fruits and medicinal compounds.Passiflora edulis is the most economically important species of the Passiflora genus, and it consists of two main varieties: the purple passion fruit (P.edulis Sims) and the yellow passion fruit (P.edulis f. flavicarpa O. Deg).The fruits of P. edulis possess various nutritional and medicinal compounds including amino acids, oils, flavonoids, triterpenoids, and carotenoids (He et al. 2020;Fonseca et al. 2022).Passiflora foetida ("Long Zhu Guo" in Chinese) is a scrambling vine native to warmer regions of the Americas and the Caribbean.However, it has also become naturalized (and in some cases invasive) in various parts of the world, including Africa, South Asia, and Hawaii.Unlike other Passiflora species, P. foetida possesses persistent bracts that surround the flowers and remain through fruiting (Fig. 1a).What is even more peculiar is that these bracts produce sticky hairs which produce digestive enzymes and thus this species is considered a protocarnivorous plant (Radhamani et al. 1995).As the epithet suggests, P. foetida emits a strong odor when the leaves are handled.Despite this, the fruits are edible, and the plant has been used in traditional medicine (Patil and Paikrao 2012).Chemical compositions among different Passiflora species vary due to differential gene expression and metabolite accumulation.For instance, sucrose is the predominant soluble sugar in P. edulis (de Oliveira et al. 2014), whereas it only presents in low concentrations in P. foetida fruits (Song et al. 2018).Thus, understanding the genomic basis behind those metabolites is urgent and would be helpful for future improvements.
In addition to edible fruits and medicinal compounds, species in Passiflora are well known for their exquisitely complex flowers.It has been shown that the MADS-box genes play a crucial role in floral transition and floral organ development (Ng and Yanofsky 2001).The ABCE model suggests that different combinations of MIKCtype MADS-box genes are required to specify the identities of different floral whorls (Coen and Meyerowitz 1991;Thomson et al. 2017;Shan et al. 2019).According to the model, A-and E-class genes specify sepals; A-, B-, and E-class genes specify petals; B-, C-, and E-class genes specify stamens; C-and E-class genes specify carpels; and finally, D-and E-class genes specify ovules (Rijpkema et al. 2010).The formation and development of Passiflora flowers exhibit peculiarities in three aspects.First, there are extra whorls between the petals and stamens, which has challenged the classical ABCE model of flower development (Classen-Bockhoff and Meyer 2016).It has been proposed that these extra whorls initiate after the formation of other whorls and might be associated with increased copies of B-and C-class genes (Classen-Bockhoff and Meyer 2016; Scorza et al. 2017).Second, it has been suggested that the development of tendrils in Passiflora species are modified reproductive structures and are closely connected to floral initiation and development, which are also associated with MADS-box genes (Scorza et al. 2017;Hernandes-Lopes et al. 2019).Third, the seemingly three symmetrical bracts subtending the flowers are actually composed of one bract which is the first primordium differentiated from the bud complex, and two bracteoles that originated from the axillary meristem of the first bract (Hernandes-Lopes et al. 2019).In the case of P. foetida, the three bracts are enlarged and highly branched with secretory glands at the branch ends, which are important for defense from and digestion of potential pests (Radhamani et al. 1995).However, for species in the Decaloba subgenus, the bracts are completely absent (Soares et al. 2018).Although previous studies have explored these questions at the anatomical and developmental levels, it is also important to understand the function of MADS-box genes in these processes from the functional genomic perspective to better understand how such traits evolved.
How genome sizes change is an important question in evolutionary biology with various explanations.Importantly, TEs are repetitive sequences that undergo rapid turnover and can alter the size and architecture of plant genomes (Novak et al. 2020).The activity of TEs can lead to gene inactivation, reprograming of gene expression, and with DNA recombination, they can further lead to deletion, rearrangement, and transposition of genes (Lisch 2013).LTR-RTs, a common type of plant TEs, are typically categorized into two superfamilies, Copia and Gypsy (Wicker et al. 2007).Individual lineages of LTR-RTs can vary significantly in copy numbers, even among closely related species (Stritt et al. 2020).To better understand the biological importance of LTR-RTs, it is necessary to classify them at the lineage-or cladelevel and map their locations in the genome (Neumann et al. 2019;Zhang et al. 2022aZhang et al. , 2022bZhang et al. , 2022c)).The known genome sizes of Passiflora species range from the smallest (0.212 pg of DNA C-value) in P. organensis to the largest (2.68 pg) in P. quadrangularis (Souza et al. 2004;Yotoko et al. 2011).This ten-fold genome size variation appears to be associated with flower size in the Passiflora subgenus but not in the Decaloba subgenus (Yotoko et al. 2011).Following divergence from a common ancestor (estimated to be 1.16 pg in size), the genome sizes become progressively smaller in the lineages leading to P. foetida (0.56 pg), while the genome sizes become progressively larger in the lineages leading to the estimated common ancestor (1.89 pg) of P. alata and P. edulis (1.26 pg) (Yotoko et al. 2011).Using the short-read dependent RepeatExplorer pipeline and fluorescent in situ hybridization, researchers found that Angela, Tekay, and Athila LTR-RT lineages are the most abundant in some Passiflora species, including P. edulis, P. organensis, P. cincinnata, and P. quadrangularis (Pamponet et al. 2019;Sader et al. 2021).Currently, there are only two versions of chromosome-level genomes available for P. edulis (Ma et al. 2021;Xia et al. 2021) and a contig-level genome of P. organensis belonging to the Decaloba subgenus (Costa et al. 2021).However, these studies use different tools to identify LTR-RTs and lack specific details on clade-level TE classification.The incomplete availability of highquality genomes hinders a comprehensive understanding of the factors contributing to variation in chromosome architecture and genome size.
In this study, our primary goal was to generate a highquality genome assembly of P. foetida using a combination of short-and long-read sequencing, as well as Hi-C scaffolding such that more accurate inferences about genome size and structure could be made.As a result, our final genome assembly was 424.18 Mb in size and included the annotation of 30,584 protein-coding genes and 509,131 TEs.With this genome assembly and annotation, we analyzed the genome evolution and wholegenome duplication events of P. foetida in comparison to P. edulis.Our findings suggest that rapid removal of LTR-RTs in P. foetida and elevated insertion activity of LTR-RTs in P. edulis contributed to the difference in genome sizes between these species.Based on a combination of phylogenetic, transcriptomic, metabolomic and genome-wide analyses, we also studied genes related to specification of floral organ identity, pigments biosynthesis, and emission of volatile compounds in P. foetida.This work substantially improves our understanding of the formation of the complex floral organs in Passiflora and provides an important resource for further research and breeding.

Genome assembly and annotation
Passiflora foetida is a perennial scrambling vine that belongs to the Passifloraceae family in Malpighiales order.The small (3 ~ 4 cm) flowers of P. foetida have white petals with a pink or pale purple filamentous crown (Fig. 1b).The flowers and fruits of P. foetida are protected by three hairy bracts that can trap small insects (Fig. 1ac).The mature fruits are yellowish-orange and usually ~ 2 cm in diameter with hard black seeds (Fig. 1d).The tendril-aided climbing behavior of P. foetida allows the plant to climb trees, cables, fences, and other structures (Fig. 1e-g).
We sequenced and assembled the genome of P. foetida using a combination of short reads, Nanopore long reads, and Hi-C scaffolding (Additional file 2: Table S1).We first used Kmerfreq to estimate the size, repetitiveness, and heterozygosity of the P. foetida genome based on 25.70 Gb of Illumina paired-end reads.The result indicated an estimated genome size of 465.46 Mb with a high repeat content and low heterozygosity (Additional file 1: Fig. S1a).We then generated 62.15 Gb of Nanopore single-pass reads and employed NextDenovo and NextPolish to achieve a de novo genome assembly of 429.16 Mb with high contiguity (607 contigs, N50 = 13.06Mb).After removing 36 mitochondrial/ plastid contigs, we further used Hi-C data (87.50Gb filtered Illumina paired-end reads) to group the remaining contigs into ten pseudochromosomes using LACHESIS.The final Hi-C assembly we used for downstream analysis was 391.01 Mb for ten pseudochromosomes (LG01 to LG10) and 33.17 Mb for unplaced contigs (Contig1 to Contig344; Fig. 1h and Additional file 1: Fig. S2; Additional file 2: Table S2).The final assembly exhibited high mapping ratios of both short (99.63%) and long reads (99.39%), high completeness (98.36%) and consensus sequence accuracy (QV = 28.72,error rate = 0.0013) using Merqury, nearly no contamination using Blobtools (Additional file 1: Fig. S1b), very few redundant sequences using purge_haplotigs (Additional file 1: Fig. S1c), as well as Benchmarking Universal Single-Copy Orthologs (BUSCO; Simao et al. 2015) with a value of 98.76%.
We used the EDTA pipeline to annotate the TEs in the genome of P. foetida, revealing that the genome was made up of 66.71% identifiable repeats.The LTR assembly index (LAI) indicated a high level of genome contiguity (LAI = 18.44) of our assembly.By combining transcriptbased, homology-based and ab initio prediction results with EVM and the PASA pipeline, we identified a total of 30,584 protein-coding genes in the genome of P. foetida, with a BUSCO completeness of 94.2%.Furthermore, by searching against the SwissProt, (Gene Ontology) GO, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam, and NR databases, we found that 97.07% of these genes were functionally annotated by at least one database (Additional file 2: Table S3).In addition, noncoding RNAs (ncRNAs) were comprehensively annotated in our study, including 1129 rRNAs, 1174 tRNAs, 1229 snRNAs, and 144 miRNAs (Additional file 2: Table S4).In summary, we successfully generated a well-annotated genome assembly of P. foetida with high contiguity and completeness (Additional file 1: Fig. S1h), which served as the basis for downstream analysis.

Genome evolution and whole-genome duplication
To investigate the genome evolution of P. foetida, we conducted a comparative analysis with the genomes of nine species, including six Malpighiales species (P.edulis, Salix purpurea, Populus trichocarpa, Manihot esculenta, Ricinus communis, Linum usitatissimum, and three other angiosperm species Arabidopsis thaliana, Vitis vinifera, and Nicotiana attenuata).In total, we identified 17,787 gene families, among which 278 were found to be singlecopy gene families.We constructed a phylogenetic tree based on these single-copy gene families and estimated the divergence time between P. foetida and P. edulis to be approximately 8.96 mya, which aligned closely with the result estimated from the TimeTree (Kumar et al. 2017) database (Fig. 2a).We next employed CAFE to analyze the expansion and contraction of gene families.The results showed that there were 117 expanded and 47 contracted gene families in P. foetida relative to its common ancestor with P. edulis (Fig. 2a).GO and KEGG enrichment analysis indicated that the expanded gene families were enriched in "ion binding" and "transcription factors", while the contracted gene families were enriched in "immune system process", "Ras signaling pathway", and "terpenoid biosynthesis" in P. foetida (Additional file 1: Fig. S3; Additional file 3: Table S5 and S6).
To analyze potential whole-genome duplication (WGD) events, we initially identified paralogous genes within syntenic blocks identified by MCScanX.By calculating the synonymous nucleotide substitution rate (Ks) of these genes, we observed a single peak for both P. foetida and P. edulis (Additional file 1: Fig. S4a), suggesting one recent WGD event shared by both species.After calibrating with the estimated divergent time, we estimated that this WGD event shared by P. foetida and P. edulis likely occurred 7.90-8.29 mya (million years ago) prior to their divergence (Fig. 2a).We also generated dot-plots of these paralogous genes within P. foetida, which further supported one recent WGD event and one ancient whole genome triplication (WGT) event (Fig. 2b, Additional file 1: Fig. S4b,c).Moreover, synteny analysis of the P. foetida genome assembly (2n = 20) with two versions of previously published P. edulis genome assemblies (2n = 18) revealed the correspondence of P. edulis Chr1 to Chr4 and Chr10 of P. foetida, which was possibly the result of a chromosome fission event according to the chromosome number known for this subgenus (Fig. 2c; Sader et al. 2019).Together, these results provide useful insights into the genome evolution of P. foetida in comparison with the congener P. edulis.

Comparative analysis of genome size
As P. foetida and P. edulis did not undergo independent WGD events after their divergence, the large difference in genome sizes may be attributed to the differential proliferation of TEs in P. edulis, and/or more rapid elimination of TEs in P. foetida.To address these questions, we first re-annotated the TEs in the two available P. edulis genomes (Ma et al. 2021;Xia et al. 2021).The result indicated that the total proportion of TEs was smaller in P. foetida (66.71%, 282.9 Mb out of 424.2 Mb) than P. edulis (83.94%, 1072.9Mb out of 1278.1 Mb and 87.29% 1171.2Mb out of 1341.7 Mb for the two published versions; Additional file 4: Table S7).Furthermore, our analysis revealed the number of LTR-RTs was considerably larger in P. edulis compared to P. foetida, with Gypsy-type LTR-RTs showing the most significant difference (Fig. 3).For instance, we extracted a syntenic block (Additional file 1: Fig. S5b) containing 10 genes both upstream and downstream of LG10.207 in P. foetida and observed a much smaller number of LTR-RTs in P. foetida (14 copies, 4502 bp total length) than in P. edulis (176 copies, 173,687 bp total length in BXG2020, and 465 copies, 595,948 bp total length in BXG2021).Thus, it appeared that the homologous gene of LG10.207 in BXG2020 (P_edulis060016494.g;Additional file 1: Fig. S5a) had undergone transposition to a different chromosome, and several other genes in BXG2021 within this microsyntenic region had been transposed from elsewhere (Additional file 1: Fig. S5b).This indicates that transposition of LTR-RTs could change the order and chromosomal location of genes, potentially providing opportunities for functional remodeling.
We then performed clade-level classification of all the EDTA-annotated LTR-RTs with TEsorter against the REXdb database of transposable element protein domains (Neumann et al. 2019;Zhang et al. 2022a), which successfully classified most LTR-RTs in P. edulis and P. foetida (Additional file 4: Table S8 and S9).We found that the two species exhibited varying copy numbers for different clades of LTR-RTs (Fig. 4).Among the high-copy LTR-RTs, two Copia clades (Angela and Tork) and three Gypsy clades (Athila, Tekay and Reina) have thousands more copies in P. edulis than in P. foetida, while only one Copia clade (SIRE) had more copies in P. foetida (Additional A recent whole genome duplication event (green dots with lower Ks values) and an ancient whole genome triplication event (purple dots with higher Ks values) could be recognized.c Syntenic blocks of paralogous genes between P. foetida and two versions of P. edulis file 4: Table S9).For intermediate-copy-number LTR-RTs, three Copia clades (Ale, Ikeros and Ivana) had several hundred more copies in P. edulis than P. foetida, while two Copia clades (Bianca and TAR) and two Gypsy clades (CRM and Galadrield) had only around one hundred more copies in P. edulis (Additional file 4: Table S9).Interestingly, for LTR-RTs with complete and ordered domains (predicted by TEsorter), the Copia clade SIRE again had fewer copies in P. foetida.This indicates that SIRE copies have been removed much faster in P. foetida.
To confirm this finding, we analyzed solo/intact LTR-RTs ratios for each clade of LTR-RTs in P. foetida and P. edulis and found that the high-copy clades Angela, SIRE, and Athila (except Tekay) had much higher ratios in P. foetida than P. edulis (Additional file 4: Table S10).The larger proportions of non-intact LTR-RTs in P. foetida indicates accelerated purging of LTR-RTs, which is consistent with a previous study on the changes of DNA C-values in these Passiflora lineages (Yotoko et al. 2011).Furthermore, we calculated the insertion time of LTR-RTs and observed that, for each clade of LTR-RTs, the average insertion time was more recent in P. foetida than in P. edulis (BXG2021; Fig. 4).The inconsistency observed in P. edulis (BXG2020) might be partially attributed to low assembly continuity (N50 = 70 kb), which can result in reduced assembly quality of repeat regions.Taken together, our analysis suggested that genome size difference between P. foetida and P. edulis is primarily due to the expansion of Angela, Athila, and Tekay LTR-RT lineages in P. edulis and more rapid elimination of the of Angela and Athila LTR-RT lineages in P. foetida.

Transcriptomic analysis of different floral organs in P. foetida
To better understand the genetic basis underlying the unique flowers in P. foetida, we analyzed the transcriptomic patterns of different floral organs.We first gathered a total of 172 fully open flowers and carefully dissected each flower into eight distinct parts based on floral structures.Starting from the outer whorls to the inner ones, the flowers of P. foetida consist of three pinnatifid bracts (Br), five green-white petaloid sepals (Se), five white petals (Pe), two whorls of corona filaments (with white outer radii (Ri) and the pigmented inner radii), four rows of very short pali (Pa), several inner structures (the nectary, operculum, limen, annulus), the androgynophore (Ag) column, five stamens (St), and one compound pistil (Pi) composed of three to four carpels (Additional file 1: Fig. S6a-d).The pali (Pa) and the inner structures located in the receptacle cup were collected as a single sample.We performed RNA-seq analysis on each of these samples/ whorls (each with three biological replicates).
From these data, we compared the differentially expressed genes (DEGs) between each pair of samples (Additional file 5: Table S11 and Additional file 6: Table S12).We defined the specifically highly expressed genes (SHEGs) for each sample group as the shared upregulated DEGs compared with the other sample groups (Additional file 1: Fig. S7; Additional file 7: Table S13).In the SHEGs of the Passiflora-specific radii (Ri), we found a APETALA2 (AP2) transcription factor (LG05.978) with a high-expression level.This gene, namely WRINKLED1 (WRI1), was originally shown to regulate oil accumulation in the seeds, and it could also regulate root auxin homeostasis and activate terpene biosynthesis through binding to AW-boxes in the promoters of target genes (Maeo et al. 2009;Kong and Ma 2018).We also found the numbers of SHEGs were among the smallest in sepals (Se) and petals (Pe), and the number of DEGs was also the lowest between sepals (Se) and petals (Pe) (Additional file 1: Fig. S7; Additional file 7: Table S13), which was consistent with the petallike appearance of the sepals.Conversely, the numbers of SHEGs were the largest in bracts (Br), stamens (St), and pistils (Pi) (Additional file 1: Fig. S7; Additional file 7: Table S13), which may be important for the specific physiological or developmental functions of these floral organs.GO and KEGG enrichment analyses for these SHEGs showed some interesting terms in samples of stamens and pistils (Additional file 1: Fig. S8 and S9; Additional file 7: Table S14 and S15).Thus, we further performed enrichment analyses for DEGs between these two sample groups and found that the upregulated DEGs in pistils (Pi) were enriched in "cellular nitrogen compound metabolic process", "ribosome biogenesis" and "ribonucleoprotein complex biogenesis", while the upregulated DEGs in stamens (St) were enriched in "cellular catabolic process" and protein post-translational modifications such as "protein polyubiquitination", "protein ubiquitination" and "protein modification by small protein conjugation" (Additional file 1: Fig. S10; Additional file 7: Table S16-19).These GO terms are similar to the study in female flowers of Populus balsamifera, which showed sex-biased expression of metabolic genes (Sanderson et al. 2019).Some of these enriched terms in the greener female floral organ might also suggest similar energetics.Due to the green color of the bracts (Br), most SHEGs of the Br are also related to photosynthesis.Besides, we also identified an HD-ZIP IV transcription factor GLABRA2 (GL2; LG08.3456).In Arabidopsis, AtGL2 was reported to play an important function in the development of leaf trichomes and root hairs (Rerie et al. 1994;Di Cristina et al. 1996).This P. foetida GL2 homolog might play similar roles in the development of the abundant glandular hairs used to trap insects found on the bracts, but additional experiments are required to confirm this hypothesis.Carnivorous plants invest lots of genes and resources to attract, capture, digest insects and to absorb the nutrients (Fukushima et al. 2017).Interestingly, we found a homolog of AMMONIUM TRANS-PORTER 1 (AMT1, LG05.214) in the SHEGs of Br, which is a pitcher-predominant transporter gene and carnivoryrelated gene shown in two previous studies on the pitcher plant Cephalotus and Dionaea muscipula, respectively (Scherzer et al. 2013;Fukushima et al. 2017).Other carnivorous genes including genes encoding chitinases, proteinases, etc., were not identified because they were also expressed in other floral organs or because these genes might be induced after capturing the pests (Saul et al. 2023).
Type-II MADS-box genes play a central role in controlling flowering time, floral meristem identity and floral organ identity (Ng and Yanofsky 2001).Copy number variation and the expression of these genes have contributed to the diversification of flower morphology in various species (Hsu et al. 2021;Hu et al. 2021;Zhang et al. 2022b).To understand the role of Type-II MADS-box genes in flower development of P. foetida, we conducted a combined analysis of phylogenetic, transcriptomic, and genomic data (Fig. 5 and Additional file 7: Table S20).We first constructed a phylogenetic tree of MADS-box genes with P. foetida, P. edulis, and A. thaliana (Fig. 5a).Among them, we identified 37 Type-II MADS-box genes in P. foetida.Notably, there are three copies of A-class genes (LG01.1184,LG03.1238,LG10.207) in P. foetida.The expression of all three A-class genes was found to be high in the bracts (Br), with one homolog (LG10.207,AP1) also expressed in sepals (Se) and petals (Pe) (Fig. 5b).We found five copies of B-class and four copies of C-class genes in P. foetida.Interestingly, an earlier study suggested that the expansion of B-class and C-class genes in P. caerulea was important for the formation of additional whorls between petals and stamens, with B-class genes being expressed at relatively lower levels than C-class genes in the radii (Ri) (Hemingway et al., 2011).In this study, we found that the expression of B-class genes was high in petals (Pe), pali (Pa), and stamens (St) but was low in radii (Ri), with one homolog (LG01.683)also expressed in sepals (Se) (Fig. 5b), which could be associated with its petal-like appearance (Baum and Whitlock 1999).This expression pattern was consistent with an earlier report where the expression of PISTILLATA (PI) was experimentally verified to be the highest in petals and stamens (Scorza et al. 2017).The expression of C-class genes was found to be high from radii (Ri) to the inner whorls (Fig. 5b).Therefore, the expression pattern of B-class and C-class genes in radii (Ri) of P. foetida is in line with that of P. caerulea, which implies a conserved mechanism underlying radii (Ri) development in different species of Passiflora.As expected, E-class genes exhibited widespread expression in all floral organs (Fig. 5b).Considering the differential expression patterns of these Type-II MADS-box genes, we further analyzed the promoter sequence of these genes, and we found that the patterns of TE distribution in the promoter regions of these genes varied greatly between different copies within each genome and among different genomes (Fig. 5c), which could potentially contribute to the modification and coordination of their expression levels.Together, our results support the importance of Type-II MADS-box genes in specifying the floral organ identity of P. foetida flowers, including the extra whorls.
Floral volatile organic compounds (VOCs) are a mixture of plant metabolites with multiple functions (Dotterl and Gershenzon 2023).To understand the chemical compositions of floral scent in P. foetida, we performed Gas Chromatography-Mass Spectrometry (GC-MS) analysis for bracts and the fully open flowers separately.We found that the bracts and the fully opened flowers of P. foetida were composed of both major and minor VOCs, in which the major shared VOCs were 2-methoxy-2methyl-propane, benzaldehyde, methyl salicylate, eugenol, and vitamin A aldehyde (Additional file 1: Fig. S5).The main unique compound detected in the flowers (without bracts) was methyl 2-methoxy benzoate (or benzoic acid, methyl ester; Fig. 7b).According to our transcriptomic data, the expression levels of the benzoic acid methyl transferases (BSMTs) genes were high in the radii (LG01.282)and bracts (LG01.700;Additional file 1: Fig. S11a; Additional file 8: Table S22).Moreover, the main unique compound detected in the bracts was linalool (or 1,6-Octadien-3-ol, 3,7-dimethy1-; Fig. 7a).Consistently, we found that the expression level of linalool synthase (LG07.309)was the highest in the bracts followed by sepals (Additional file 5: Table S11).According to the literature, the functions of these floral VOCs are under the selection of both pollinators and florivores/ herbivores, and often co-evolve with the color and morphology of flowers (Schiestl 2015).

Discussion
In this work, we present a chromosome-scale genome assembly of P. foetida.Through comparing repeat content with the close relative P. edulis and analyzing transcriptomic data of different floral organs and developmental stages, we broadly characterized the genomic underpinnings of genome size variation and floral traits in Passiflora species, establishing a solid foundation for future research endeavors and flower breeding programs.

The activity of transposable elements contributes to genome size variation between two Passiflora species
Genome sizes vary more than ten-fold between some Passiflora species.According to a previous study (Yotoko et al. 2011), the genome size of P. foetida is inferred to be smaller than that of the common ancestor between P. foetida and P. edulis while P. edulis is inferred to be larger, however what genomic changes were responsible for these differences in size was unknown.Genome size differences have been previously studied in P. organensis but the genome was only a draft version and this species was from the Decaloba subgenus (Costa et al. 2021).In the present study, we compared WGD/WGT events and TE contents based on chromosome-level genomes of P. foetida and P. edulis.We found no independent WGD events following the divergence of these two species suggesting that the disparity in genome sizes was not caused by polyploidization but was primarily driven by differences in TE activity, particularly the Gypsy superfamily of LTR-RTs (Fig. 3).After further characterization of LTR-RT lineages, we concluded that expansion of three LTR-RT lineages (Angela, Athila, Tekay) in P. edulis and more rapid elimination of two of them (Angela, Athila) in P. foetida were the main drivers of differences in genome size.Interestingly Angela, Athila, and Tekay LTR-RT lineages are all known for being especially long in LTR length, which may be associated with higher activity of homologous recombination activity (Neumann et al. 2019).In the grass species Brachypodium distachyon, the Angela lineage of LTR-RT was identified as the most active, with high copy number and high-turnover rates (Stritt et al. 2020).
An interesting question is what mechanisms underlie this differential removal of LTRs in P. foetida compared with P. edulis.A previous study in cotton (Gossypium) suggested that genome expansion mediated by episodic recent LTR-RT amplification was readily reversed by higher rate of DNA loss in smaller genomes (Hawkins et al. 2009), which was perhaps controlled by the intrinsic dynamics of TE proliferation and removal (Novak et al. 2020).Besides genomic factors, ecological factors such as generation time, population size, and nutrients were also proposed to affect the evolution of genome sizes (Leitch and Leitch 2012).In the Passiflora genus, genome size has been shown to correlate with cytological and phenotypical traits, such as flower size (Yotoko et al. 2011;Bugallo et al. 2023).Thus, these traits could be further under the selection by pollinators or by humans for ornamental values.As with having carnivorous bracts, P. foetida can live better in nutrient-deficient habitats, compared with other Passiflora species with larger genome size, because it needs fewer resources to build its smaller genome and smaller reproductive organs.Whether LTR-RT activity is the general mechanism of genome size variation, and whether the same LTR-RT lineages are involved in most cases of genome expansion in Passiflora will require the generation of more chromosome-level genome assemblies across the genus.
In addition to expanded genome size, TE insertions can play a crucial role in driving rapid phenotypic variation in plants.For instance, in natural populations of Capsella rubella, TE insertions have been shown to correlate with changes in flowering time by affecting the expression levels of the FLOWERING LOCUS C (FLC) gene (Niu et al. 2019).TE insertion polymorphisms (TIPs) were shown to be crucial in the domestication of Brassica rapa through modifying gene expression levels and altering gene structures (Cai et al. 2022).In the domestication of rice (Oryza sativa), TIPs were also frequently associated with changes in regulatory genes (Castanera et al. 2023).Thus, the dynamic patterns of TE distribution in the promoter regions of ABCE floral genes might be an important contributor to modifying the gene expression patterns (Fig. 5c), which could potentially result in different floral traits in Passiflora species.

MADS-box genes are important in the specialized floral development of P. foetida
Data from differential expression and copy number variation of MIKC-type MADS-box genes may challenge the classical ABCE model of floral organ development (Teo et al. 2019).For example, in Bougainvillea glabra, the expression of A-and B-class genes in bracts were associated with the large size and brilliant color of these subfloral organs (Zhang et al. 2022b).In orchids, B-class and AGAMOUS-like 6 (AGL6) genes evolved additional functions, including regulating anthocyanin accumulation and red spot formation in the flowers (Hsu et al. 2021).Before chromosome-level genomes were available, researchers studied the ontogeny of the unique flower features in Passiflora species based on quantitative Fig. 6 The expression levels of anthocyanin biosynthetic genes in P. foetida.Heatmaps show the expression levels of anthocyanin biosynthetic genes in small (FA), intermediate (FB), and large (FC) flower buds, as well as petals (Pe) and radii (Ri) of P. foetida.The genes were identified by BLAST searches.Abbreviations: PAL, phenylalanine ammonia-lyase; C4H, cinnamate-4-hydroxylase; 4CL, 4-coumarate CoA ligase; CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F3'H, flavanone 3′-hydroxylase; F3'5'H, flavanone 3′,5′-hydroxylase; FLS, flavonol synthase; DFR, dihydroflavonol 4-reductase; ANS, anthocyanidin synthase; and UFGT, UDP-flavonoid glucosyltransferase Real-Time Reverse Transcription PCR (qRT-PCR) and in situ hybridization (Costa et al. 2019;Pamponet et al. 2019).These studies showed that A-class genes (AP1 and FRUITFULL [FUL]) in P. edulis were highly expressed in many floral organs, including bracts and tendrils (Scorza et al. 2017).Additional comparative studies in multiple Passiflora species also indicated that the AP1 gene might play important roles in regulating bract and tendril development (Hernandes-Lopes et al. 2019).
In the classical ABC model, expression of only A-class genes specifies sepals, expression of both A-and B-class genes specifies petals, expression of both B-and C-class genes specifies stamens, expression of only C-class genes specifies carpels (Coen and Meyerowitz 1991).In our Volatile compounds were identified by GC-MS.The peaks were compared against NIST library using Xcalibur software.Names of compounds (black integer labels in the graphs) were shown on the right analyses, the combination of phylogenetic and transcriptomic data showed that these genes were expressed in multiple whorls, including the extra whorls between the petals and stamens.It appeared that some copies of the floral ABC genes still exhibited classical expression patterns in the four whorls of sepals, petals, stamens, and carpels (Fig. 5b).All three A-class genes were expressed in the glandular bracts, while only one copy showed high expression levels in the sepals, petals and additional whorls.This normally very tiny floral organ was not included in the classic ABC model, but could also have ornamental values in some species, such as Curcuma alismatifolia (Liao et al. 2022).According to the expression patterns, the B-class gene LG08.1025appeared to act as a B-function gene, because it was expressed at very low levels in sepals and pistils, but highly expressed in petals and stamens (Fig. 5).However other B-class-like genes, LG01.683 in particular, were expressed in sepals and pistils at high levels.The four C-class genes were assigned to two groups based on phylogenetic relationships and similar expression patterns.Thus, expansion of B-and C-class genes and their combined dosage might be one of the factors underlying the unique floral features in Passiflora species.Moreover, we also compared the chromosomal locations and flanking sequences around A-class genes in P. foetida and P. edulis.The results from these analyses indicated that these genes were being rapidly translocated potentially because of TE-mediated recombination processes and that TEs were more abundant in the promotors of A-class genes in P. edulis (Additional file 1: Fig. S5).This additional genome plasticity might contribute to the diversification of flower morphology and pollination strategies of Passiflora species.With chromosome-level genome assemblies, we were able to study MADS-box genes in more detail, which has helped to improve our understanding of the development of the distinctive Passiflora flowers and will enhance the ornamental value through marker assisted breeding.

Floral color and scent in P. foetida
Recently, it was shown that the flowers of P. foetida exhibited both self-pollination and cross-pollination, with the intermediate curvature of stigma facilitating the pollinators (Mallikarjuna 2023).As suggested by our transcriptomic data, the biosynthesis and accumulation of anthocyanin occurred in late stages of floral development.The production of the visual signal (anthocyanin) only when needed saves energy and resources for P. foetida, which in addition could be learned and selected by pollinators (mutualists).
Besides, the formation of floral scent or VOCs is in conflict selection by both pollinators and florivores/ herbivores (Schiestl 2015).The methyl benzoate emitted by the fully open flowers of P. foetida can be a signal to attract pollinators such as bees (Additional file 1: Fig. S6c), while remain highly toxic towards certain pests (Zhao et al. 2022).Interestingly, in Petunia RNAi lines that cannot produce methyl benzoate, damage rates were dramatically reduced compared with normal flowers, which emitted very similar VOCs as P. foetida (Kessler et al. 2013).Generally, whether it is beneficial or harmful to the plants depends on the relative abundance of pollinators and florivores/herbivores in the environment and the strength of VOC emission (Schiestl 2015).The glandular bracts of P. foetida emitted a major scent component linalool (Additional file 1: Fig. S11a).This common VOC can function in both floral defense and pollinator attraction depending on the ratio of the two enantiomers, which in turn is determined by the spatiotemporal expression of biosynthetic genes (Raguso 2016).Perhaps in P. foetida, linalool emitted by the bracts was more a defensive compound, which required to be verified by advanced metabolomics technologies (Yan et al. 2022).Further studies are needed to completely understand the ecological roles of floral scent in P. foetida.

Conclusion
In conclusion, our newly assembled genome and the transcriptomic data of P. foetida provide a valuable new resource for comprehending the important roles played by LTR-RT lineages in genome size variation.Furthermore, our findings shed light on the potential roles of expression and copy number variation of MADS-box genes to flower development in Passiflora species.

Plant materials
Branch tips, including young leaves and tendrils of P. foetida, were collected from one plant near Luogu Mountain Park (114.4926°E, 22.6007° N) at Dapeng New District, Shenzhen, Guangdong province, China for Illumina, Oxford Nanopore Technologies (ONT), and Hi-C sequencing.The leaves, stems, tendrils, young fruit, and flower buds were used for RNA-seq and were collected from multiple plants.A total of 172 flowers were collected from one plant and kept frozen until the different flower parts could be dissected.The samples of floral organs were ground thoroughly in liquid nitrogen for RNA extraction and RNA sequencing.

DNA extraction, library construction and sequencing
Genomic DNA was prepared by the sodium dodecyl sulfate (SDS) method followed by purification with QIAGEN ® Genomic DNA extraction kit (Cat#13343, QIAGEN).Purified DNA was quantified using Nanodrop (ND-1000, Thermo Scientific, Wilmington, DE, USA) and Qubit ® 3.0 Fluorometers (Invitrogen, USA).For Illumina sequencing, libraries were prepared with fragmented DNA (200-400 bp) through end-repair, 3′ adenylated, adapter-ligation, PCR amplification with the products recovered by the AxyPrep Mag PCR clean up Kit.The double stranded PCR products were heat denatured and circularized by the splint oligo sequence and sequenced on an MGISEQ instrument.For ONT sequencing, additional steps after DNA extraction included selecting long DNA fragments using the BluePippin system (Sage Science, USA), A-ligation repair of DNA ends using NEB-Next Ultra II End Repair/dA-tailing Kit (Cat# E7546), and adapter ligation using the LSK109 kit.The resulting ONT library were sequenced on a Nanopore Prome-thION sequencer (Oxford Nanopore Technologies, UK) instrument at Nextomics.To construct the Hi-C library for sequencing on an MGI-2000 platform, freshly collect tissues were first fixed using 2% formaldehyde (glycine was used to stop crosslinking) and ground into powder and resuspended to isolate the nuclei.Then, DpnII, biotin-14dCTP, and T4 DNA polymerase were sequentially added.The ligated DNA was sheared into 300-600 bp fragments, and then was blunt-end repaired and A-tailed, followed by purification through biotin-streptavidinmediated pull down.

Survey of genome size and heterozygosity
To estimate the genome size and heterozygosity, 23.71 Gb of clean Illumina paired-ended reads were analyzed using Kmerfreq (17-mer) in gse (v1.0.2) software.

Whole-genome duplication analysis
The synonymous substitution rate (K S ) of homologous genes in MCScanX-identified syntenic blocks and was calculated to determine recent WGD or WGT events.Dot-plots were generated to further assess WGD events.

Estimation of LTR insertion time
Solo and intact LTR TEs were calculated by the scripts "solo_finder.pl"and "intact_finder_coarse.pl" in LTR_ retriever (Ou and Jiang 2018).The solo/intact ratios were calculated by "solo_intact_ratio.pl".LTR insertion time was also calculated by LTR_retriever, where μ = 5.35e-9 (using estimated absolute rates of silent-site divergence from De La Torre et al. 2017).

RNA extraction, cDNA preparation, sequencing and analysis
Total RNA from each sample was extracted using the RNAprep pure Plant Kit (TIANGEN) and was purified using poly-T beads.After synthesis, cDNA was first end-repaired, A-tailed, adapter-ligated, PCR-amplified, denatured, and circularized.The resulting products were quality-controlled using gel electrophoresis, Nanodrop (ND-1000, Thermo Scientific, Wilmington, DE, USA), and an Agilent 2100 Bioanalyzer System.RNA sequencing was performed using the MGIseq2000 platform.Raw reads were first quality controlled using SOAPnuke (v2.1.0;Chen et al. 2018) and further filtered using Trimmomatic (v0.39;Bolger et al. 2014).The resulting clean reads were aligned against the P. foetida genome assembly.StringTie (v2.1.4;Pertea et al. 2015) was used to get the Fragments per Kilobase Million (FPKM) value for each transcript.PCA plots, correlation heatmap, and the correlation coefficient matrix was generated using the R package ggplot2.DEGs between sample pairs were analyzed using the R package DESeq2.KEGG and GO enrichment analysis was performed using the R packages KEGGREST (v1.30.1) and AnnotationForge (v1.32.0), respectively.

Identification and analysis of MADS-box and floral metabolite genes
First, candidate MADS-box genes from P. foetida and P. edulis were identified by hmmsearch (v3.3.2;cutoff e-value < 0.1) using downloaded hidden Markov model (HMM) profiles of SRF-type-I (PF00319) and MEF2type-II (PF09047) as queries, combining the BLASTp (cutoff e-value < 0.0001) results using Arabidopsis homologs as queries (Parenicova et al. 2003).Then these candidate proteins were searched for conserved domains in the Pfam database.MIKC-type MADS-box genes with a K-domain were aligned by MAFFT (v7.490;Katoh and Standley 2013) and were used to build a maximum likelihood tree using IQTree2 (v2.0.7;Minh et al. 2020).TE contents surrounding these MADS-box genes were analyzed using MCScanX and visualized using LINKVIEW2 (v1.0.5).
Genes involved in the biosynthesis of anthocyanin and methyl benzoate were identified by BLASTP, in which the azalea (Rhododendron simsii) homologs were chosen as queries because of the relatively complete annotation of these genes in this species.All search results were compared to the results of genome annotation.

Metabolomic analysis of volatile compounds
Volatile compounds were collected by solid-phase microextraction.Samples from the bracts (Br) and the remaining parts of six fully open flowers of P. foetida were collected and cut to fit into 15 ml headspace bottles with three replicates.These collected samples were settled in the bottle for 2 hours, inserted with the extraction fiber for 0.5 hour respectively, and then inserted into the GC injection port for 3 min at 250 °C.The extracted compounds were measured using a GCMS-QP2010 system (Shimadzu, Japan), equipped with a DB-5MS column (30 m × 0.25 mm, 0.25 ms).He (99.999%) was used as carrier gas (split injection mode, 27.2 mL/min total flow rate).Ion source and interface temperatures were 200 °C and 250 °C, respectively.The detailed GC-MS run parameters used in the current study are described previously in detail (Zhang et al. 2022c), i.e., 1 KV detector voltage and 30 to 500 m/z full scanning mass range, 40 min total heating time (50 °C, 1 min; increasing 5 °C per minute; 250 °C, 2 min).The identities of major volatile compounds based on the chromatograms and mass spectrograms were compared with the NIST2011 library and were checked against the Pherobase database, the hit with maximal probability was kept for each peak (http:// www.phero base.com).

Fig. 1
Fig. 1 Habit and morphology of Passiflora foetida and its genome assembly.a-g Photos show the (a) flower bud surrounded by glandular bracts, (b) fully opened flower, (c) young fruit with persistent glandular bracts, (d) mature fruit and (e-g) examples of the climbing habit.h Circos plot represents the ten pseudochromosomes of P. foetida.Tracks represent (i) the density of Copia-LTR-RTs, (ii) the density of Gypsy-LTR-RTs, (iii) the distribution of other repeat elements, (iv) gene density, and (v) GC content.These metrics were calculated in 500 Kb windows.The arcs at the center represent collinear blocks between chromosomes containing 15 or more genes

Fig. 2
Fig. 2 Evolutionary analysis of P. foetida.a A phylogenetic tree of ten species including P. foetida is shown based on 278 single-copy genes.Expanded and contracted gene families and divergence times are shown at tips and nodes.b A dot plot shows paralogous genes in P. foetida.A recent whole genome duplication event (green dots with lower Ks values) and an ancient whole genome triplication event (purple dots with higher Ks values) could be recognized.c Syntenic blocks of paralogous genes between P. foetida and two versions of P. edulis

Fig. 3
Fig. 3 The transposable element content in P. foetida and P. edulis.a A tree of P. foetida and two P. edulis assemblies with genome sizes in parentheses.b Stacked bar plot shows the proportions of major groups of transposable elements in P. foetida and P. edulis.c Histogram shows the total length of different classes of transposable elements in each genome

Fig. 4
Fig. 4 Clade-level insertion times and copy numbers of LTR-RTs in P. foetida and P. edulis.Box plots (left panel) show the distribution of inferred insertion times of all LTR-RTs for each clade.Bar plots (right panel) show clade-level copy numbers of all LTR-RTs in P. foetida and P. edulis

Fig. 5
Fig. 5 Phylogenetic and gene expression analysis of MADS-box genes in P. foetida.a A phylogenetic tree of Type-II MADS-box genes in P. foetida (orange), P. edulis (green for BXG2020, blue for BXG2021) and Arabidopsis thaliana (black).b Gene expression levels of Type-II MADS-box genes in bracts (Br), sepals (Se), petals (Pe), outer and inner radii (Ri), pali (Pa), androgynophore (Ag), stamens (St), and pistils (Pi).The bars below the photos of floral organs indicate the glandular bracts (blue), the classical four whorls (black), and the extra whorls (pink).The copies exhibiting classical expression patterns in the classical ABC model are shown below the bars.c Plots show transposable elements (TEs) in the 5-kb flanking regions of the ABCE MADS-box genes.Red and orange boxes indicate the genes are in forward or reverse orientation, respectively.Different types of TEs are indicated by black (Class I), blue (Class II), and gray (other) boxes.Genes with LTRs in the promoter (upstream) regions are indicated by red stars

Fig. 7
Fig. 7 Volatile compounds in P. foetida flowers.a, b GC peaks for three replicates of (a) bracts and (b) the remaining parts of P. foetida flowers.Volatile compounds were identified by GC-MS.The peaks were compared against NIST library using Xcalibur software.Names of compounds (black integer labels in the graphs) were shown on the right